Discriminating similar languages: experiments with linear SVMs and neural networks
نویسندگان
چکیده
This paper describes the systems we experimented with for participating in the discriminating between similar languages (DSL) shared task 2016. We submitted results of a single system based on support vector machines (SVM) with linear kernel and using character ngram features, which obtained the first rank at the closed training track for test set A. Besides the linear SVM, we also report additional experiments with a number of deep learning architectures. Despite our intuition that non-linear deep learning methods should be advantageous, linear models seem to fare better in this task, at least with the amount of data and the amount of effort we spent on tuning these models.
منابع مشابه
When Sparse Traditional Models Outperform Dense Neural Networks: the Curious Case of Discriminating between Similar Languages
We present the results of our participation in the VarDial 4 shared task on discriminating closely related languages. Our submission includes simple traditional models using linear support vector machines (SVMs) and a neural network (NN). The main idea was to leverage language group information. We did so with a two-layer approach in the traditional model and a multi-task objective in the neura...
متن کاملDiscriminating between Similar Languages with Word-level Convolutional Neural Networks
Discriminating between Similar Languages (DSL) is a challenging task addressed at the VarDial Workshop series. We report on our participation in the DSL shared task with a two-stage system. In the first stage, character n-grams are used to separate language groups, then specialized classifiers distinguish similar language varieties. We have conducted experiments with three system configurations...
متن کاملDiscrimination between Similar Languages, Varieties and Dialects using CNN- and LSTM-based Deep Neural Networks
In this paper, we describe a system (CGLI) for discriminating similar languages, varieties and dialects using convolutional neural networks (CNNs) and long short-term memory (LSTM) neural networks. We have participated in the Arabic dialect identification sub-task of DSL 2016 shared task for distinguishing different Arabic language texts under closed submission track. Our proposed approach is l...
متن کاملEstimation of Hand Skeletal Postures by Using Deep Convolutional Neural Networks
Hand posture estimation attracts researchers because of its many applications. Hand posture recognition systems simulate the hand postures by using mathematical algorithms. Convolutional neural networks have provided the best results in the hand posture recognition so far. In this paper, we propose a new method to estimate the hand skeletal posture by using deep convolutional neural networks. T...
متن کاملProbabilistic Contaminant Source Identification in Water Distribution Infrastructure Systems
Large water distribution systems can be highly vulnerable to penetration of contaminant factors caused by different means including deliberate contamination injections. As contaminants quickly spread into a water distribution network, rapid characterization of the pollution source has a high measure of importance for early warning assessment and disaster management. In this paper, a methodology...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016